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Abstract 

We find a sharp combinatorial bound for the metric entropy of sets in 
and general classes of functions. This solves two basic combinatorial conjec- 
tures on the empirical processes. 1. A class of functions satisfies the uniform 
Central Limit Theorem if the square root of its combinatorial dimension is in- 
tegrable. 2. The uniform entropy is equivalent to the combinatorial dimension 
under minimal regularity. Our method also constructs a nicely bounded coor- 
dinate section of a symmetric convex body in M". In the operator theory, this 
essentially proves for all normed spaces the restricted invertibility principle of 
Bourgain and Tzafriri. 



1 Introduction 

This paper develops a sharp combinatorial method for estimating metric entropy 
of sets in M" and, equivalently, of function classes on a probability space. A need 
in such estimates occurs naturally in a number of problems of analysis (functional, 
harmonic and approximation theory), probability, combinatorics, convex and dis- 
crete geometry, statistical learning theory, etc. Our entropy method, which evolved 
from the work of S.Mendelson and the second author jMV D.Sj . is motivated by sev- 
eral problems in the empirical processes, asymptotic convex geometry and operator 
theory. 
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Throughout the paper, F is a class of real valued functions on some domain 
It is a central problem of the theory of empirical processes to determine whether the 
classical limit theorems hold uniformly over F. Let ;U be a probability distribution 
on Q. and Xi, X2, . . . G 17 be independent samples distributed according to a common 
law /X. The problem is to determine whether the sequence of real valued random 
variables (/(Xj)) obeys the central limit theorem uniformly over all / G F and 
over all underlying probability distributions i.e. whether the random variable 

Ylll=i{f i-^i) ~ fi^i)) converges to a Gaussian random variable uniformly. With 
the right definition of the convergence, if that happens, F is a uniform Donsker 
class. The precise definition can be found in and |l)u 99j . 

The pioneering work of Vapnik and Chervonenkis |VC 681 fVC TlllVC 81j demon- 
strated that the validity of the uniform limit theorems on F is connected with the 
combinatorial structure of F, which is quantified by what we call the combinatorial 
dimension of F. For classes of {0, l}-valued functions, it is the classical Vapnik- 
Chervonenkis dimension. For a general class F and t > 0, a subset u of is called 
t-shattered by a class F if there exists a level function h on a such that, given any 
partition u = (T_ Ucj+, one can find a function f £ F with f{x) < h{x) if x G cr_ and 
f{x) > h(x) + t if X G 0"+. The combinatorial dimension of F, denoted by v{F,t), 
is the maximal cardinality of a set t-shattered by F. Simply speaking, v{F, t) is the 
maximal size of a set on which F oscillates in all possible ibt/2 ways around some 
level h. 

Connections between the combinatorial dimension (and its variants) with the 
limit theorems of probability theory have been the major theme of many papers. 
For a comprehensive account of what was known about these profound connections 
by 1999, we refer the reader who to the book of Dudley |Du 99j . 

Dudley proved that a class of {0, 1}- valued functions is a uniform Donsker class 
if and only if its combinatorial (Vapnik-Chernovenkis) dimension v{F,l) is finite. 
This is one of the main results on the empirical processes for {0, 1} classes. The 
problem for general classes turned out to be much harder |T 02j . |MV 03j . In the 
present paper we prove an optimal integral description of uniform Donsker classes 
in terms of the combinatorial dimension. 

Theorem 1.1 Let F be a uniformly bounded class of functions. Then 

/ a/ v{F, t) dt < 00 =^ F is uniform Donsker =^ v{F,t) = 0{t^'^). 
Jo 

This trivially contains Dudley's theorem on the {0, 1} classes. M.Talagrand 
proved Theorem 11.11 with an extra factor of log*''^(l/t) in the integrand and asked 
about the optimal value of the absolute constant exponent M |T 92j . |T 02j . Ta- 
lagrand's proof was based on a very involved iterational argument. In |MV D.Sj . 
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S.Mendelson and the second author introduced a new combinatorial idea. Their 
approach led to a much clearer proof, which allowed to reduce the exponent to 
M = 1/2. Theorem 11.11 removes the logarithmic factor completely, thus the opti- 
mal exponent is M = 0. Our argument significantly relies on the ideas originated in 
|MV 03 j and also uses a new iterational method. The second implication of Theorem 
11.11 which makes sense for t ^ 0, is well-known ( \Du 99) 10.1). 

Theorem 11.11 reduces to estimating metric entropy of F by the combinatorial 
dimension of F. For t > 0, the Koltchinskii- Pollard entropy of F is 

D{F, t) = log sup (n\3h,...,fneF yi<j I {h - fjfdfi > t^) 

where the supremum is by n and over all probability measures ^ supported by the 
finite subsets of fi. It is easily seen that D{F, t) dominates the combinatorial dimen- 
sion: D(F,t) > v{F,2t). Theorem 11.11 should then be compared to the fundamental 
description valid for all uniformly bounded classes: 

/•oo 

/ ^/D{F,t) dt <oo =^ F is uniform Donsker D{F,t) = 0{t'^). (1.1) 

The left part of ()1.1|) is a strengthening of Pollard's central limit theorem and is due 
to Gine and Zinn (see |GZj . |Du 99j 10.3, 10.1). The right part is an observation 
due to Dudley f |Du 99| 10.1). 

An advantage of the combinatorial description in Theorem 11.11 over the entropic 
description in Hl.l|) is that the combinatorial dimension is much easier to bound 
than Koltchinskii-Pollard entropy (see \KB^ ) . Large sets on which F oscillates in 
all ibt/2 ways are so sound structures that their existence can be hopefully easily 
detected or eliminated, which leads to an estimate on the combinatorial dimension. 
In contrast to this, bounding Koltchinskii-Pollard entropy involves eliminating all 
large separated configurations /i, . . . , /n with respect to all probability measures /x; 
this can be a hard problem even on the plane (for a two-point domain Q). 

The nontrivial part of Theorem 11.11 follows from ()1.1() and the central result of 
this paper: 

Theorem 1.2 For every class F , 

/ y^D{F,t) dt^ / ^v{F,t) dt. 
Jo Jo 

The equivalence x is up to an absolute constant factor C, thus ax6 iff a/C<5< 
Ca. 

Looking at Theorem 11.21 one naturally asks whether the Koltchinskii-Pollard 
entropy is poinwise equivalent to the combinatorial dimension. M. Talagrand indeed 
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proved this for uniformly bounded classes under minimal regularity and up to a 
logarithmic factor. For the moment, we consider a simpler version of this regularity 
assumption: there exists an a > 1 such that 

v{F, at) < ^ v{F, t) for all t > 0. (1.2) 

In 1992, M. Talagrand proved essentially under (|1.2|) that for < t < 1/2 

cv{F,2t) < D{F,t) < C v{F,ct)\og^\l/t) (1.3) 

|T 92j . see |T 87j . |T 02j . Here c > is an absolute constant and M depends only on 
a. The question on the value of the exponent M has been open. S. Mendelson and 
the second author proved ()1.3() without the minimal regularity assumption (|1.2() and 
with M = 1, which is an optimal exponent in that case. The present paper proves 
that with the minimal regularity assumption, the exponent reduces to M = 0, thus 
completely removing both the boundedness assumption and the logarithmic factor 
from Talagrand's inequality ()1.3() . As far as we know, this unexpected fact was not 
even conjectured. 

Theorem 1.3 Let F he a class which satisfies the minimal regularity assumption 
(rr2|l . Then for all t > 

c v{F, 2t) < D{F, t)<C v{F, ct), 

where c > is an absolute constant and C depends only on a in (|1.2|1 . 

Therefore, in presence of minimal regularity, the Koltchinski-Pollard entropy and 
the combinatorial dimension are equivalent. Rephrasing M. Talagrand's comments 
from |T 02j on his inequality H1.3|) , Theorem 11.31 is of the type "concentration of 
pathology". Suppose we know that D{F,t) is large. This simply means that F 
contains many well separated functions, but we know very little about what kind of 
pattern they form. The content of Theorem 11.31 is that it is possible to construct 
a large set a on which not only many functions in F are well separated from each 
other, but on which they oscillate in all possible itct ways. We now have a very 
precise structure that witnesses that F is large. This result is exactly in the line of 
Talagrand's celebrated characterization of Glivenko-Cantelli classes |T 87j . |T 96j . 

Theorem 11.31 remains true if one replaces the L2 norm in the definition of the 
Koltchinski-Pollard entropy by the Lp norm for 1 < p < cxd. The extremal case 
p = cxD is important and more difficult. The Loo entropy is naturally 

Doo{F,t) = log sup (n I 3/1,... ,/„ eF < j sup |(/i - /j)(w)| > t). 
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Assume that F is uniformly bounded (in absolute value) by 1 . Even then Dqo (F, t) 
can not be bounded by a function of t and v{F, ct): to see this, it is enough to take 
for F the collection of the indicator functions of the intervals [2"^^"^, 2~'^], /c S N, 
in = [0,1]. However, if O is finite, it is an open question how the L^o entropy 
depends on the size of N.Alon et al. \AB (JHj proved that \i \Vt\ = n then 
F>oo{F,t) = O(log^n) for fixed t and v{F,ct). They asked whether the exponent 
2 can be reduced. We answer this by reducing 2 to any number larger than the 
minimal possible value 1. For every e £ (0, 1), 

Doo{F,t) < Cv log{n/vt) ■ log'^{n/v), where v = v{F,cet) (1.4) 

and where C, c > are absolute constants. One can look at this estimate as a 
continuous asymptotic version of Sauer-Shelah Lemma. The dependence on t is 
optimal, but conjecturally the factor log^(n/t;) can be removed. 

The combinatorial method of this paper applies to the study of coordinate sec- 
tions of a symmetric convex body K in M". The average size of K is commonly 
measured by the so-called M-estimate, which is Mk = Js"-i W^Wk da{x), where a 
is the normalized Lebesgue measure on the unit Euclidean sphere 5"""^ and || • \\k is 
Minkowski functional of K. Passing from the average on the sphere to the Gaussian 
average on M", Dudley's entropy integral connects the M-estimate to the integral of 
the metric entropy of K; then Theorem 11.21 replaces the entropy by the combinato- 
rial dimension of K. The latter has a remarkable geometric representation, which 
leads to the following result. For 1 < p < oo denote by the unit ball of the space 

5; = {xGM" : |xi|P + --- + |x„|P< 1}. 

If Mk is large (and thus K is small "in average") then there exists a coordinate 
section of K contained in the normalized octahedron D = y/nB"^ . Note that the 
is bounded by an absolute constant. In the rest of the paper, C, C , Ci, c, c', ci, . . . 
will denote positive absolute constants whose values may change from line to line. 

Theorem 1.4 Let K be a symmetric convex body containing the unit Euclidean ball 
B2, and let M = cM;^ log~^/^(2/Mx). Then there exists a subset a of{l, . . . ,n} of 
size \a\ > M'^n, and such that 

M{Kn R"") C ^/\a\B'[. (1.5) 

Recall that the classical Dvoretzky theorem in the form of Milman guarantees, for 
M = Mk, the existence of a subspace E of dimension dimE > cM'^n and such that 

ciB^nECM{KnE)Cc2B^nE. (1.6) 
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To compare the second inclusion of H1.6() to (jl.Sj) . recall that by Kashin's theorem 
( [K 77j . |K 85j . see [HIl 6) there exists a subspace E in M"" of dimension at least 
\cr\/2 such that the section y^\a\B^ n E is equivalent to B2 H E. 

A reformulation of Theorem 11.41 in the operator language generalizes the re- 
stricted invertibility principle of Bourgain and Tzafriri BT 87 to all normed spaces. 
Consider a linear operator T : I2 ^ X acting from the Hilbert space into arbitrary 
Banach space X. The "average" largeness of such an operator is measured by its 
£-norm, defined as £(T)^ = E||Tg(|p, where g = {gi, . . . ,gn) and gi are normalized 
independent Gaussian random variables. We prove that if i{T) is large then T is 
well invertible on some large coordinate subspace. For simplicity, we state this here 
for spaces of type 2 (see |L'l'j 9.2), which includes for example all the Lp spaces and 
their subspaces for 2 < p < 00. For general spaces, see Section |7| 

Theorem 1.5 (General Restricted Invertibility) Let T : I2 ^ X he a lin- 
ear operator with liT)"^ > n, where X is a normed space of type 2. Let a = 
clog~^'^^(2||T||). Then there exists a subset a 0/ {!,..., n} of size \a\ > a^n/||r|p 
and such that 

\\Tx\\ > al3x\\x\\ for all x £ M*^ 

where c > is an absolute constant and (3x > depends on the type 2 constant of 
X only. 

Bourgain and Tzafriri essentially proved this restricted invertibility principle for 
X = I2 (and without the logarithmic factor), in which case i{T) equals the Hilbert- 
Schmidt norm of T. 

The heart of our method is a result of combinatorial geometric flavor. We com- 
pare the covering number of a convex body by a given convex body D to the 
number of the integer cells contained in K and its projections. This will be ex- 
plained in detail in Section |2 All main results of this paper are then deduced from 
this principle. The basic covering result of this type and its proof occupies Section|31 
First applications to covering K by ellipsoids and cubes appear in Section [l] Esti- 
mate (|1.4|l is also proved there. Section El deals with covering by balls of a general 
Lorentz space; the combinatorial dimension controls such coverings. From this we 
deduce in Section El our main results. Theorems 11.21 and 11.31 Theorem 11.21 shows in 
particular that in the classical Dudley's entropy integral, the entropy can be replaced 
by the combinatorial dimension. This yields a new powerful bound on Gaussian 
processes (see Theorem 16.51 below), which is a quantitative version of Theorem ll.il 
This method is used in Section Q to prove Theorem 11.41 on the coordinate sections 
of convex bodies. Theorem 11.41 is equivalently expressed in the operator language as 
a general principle of restricted invertibility, which implies Theorem 11.51 
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2 The Method 

Let K and D be convex bodies in M". We are interested in the covering number 
N{K,D), the minimal number of translates of D needed to cover K. More pre- 
cisely, N{K, D) is the minimal number for which there exist points Xi,X2, ■ ■ ■ xj^j 
satisfying 

N 

K<^ [j{xj + D). 

i=i 

Computing the covering number is a very difficult problem even in the plane |('F(-rj . 
Our main idea is to relate the covering number to the cell content of AT, which we 
define as the number of the the integer cells contained in all coordinate projections 
of A": 

S(A') = number of integer cells contained in PK. (2-1) 
p 

The sum is over all 2" coordinate projections in M", i.e. over the orthogonal pro- 
jections P onto with cr C {1, . . . ,n}. The integer cells are the unit cubes with 
integer vertices, i.e. the sets of the form a + [0, 1]*^, where a G If . For convenience, 
we include the empty set in the counting and assign value 1 to the corresponding 
summand. 

Let D be an integer cell. To compare N{K,D) to S(A) on a simple example, 
take K to be an integer box, i.e. the product of n intervals with integer endpoints 
and lengths Oj > 0, i = l,...,n. Then N{K,D) = ]^"max(aj,l) and S(Ar) = 

riiK + i). Thus 

2-"I;(A:) < N{K,D) < S(A). 

The lower bound being trivially true for any convex body AT, an upper bound of 
this type is in general difficult to prove. This motivates the following conjecture. 

Conjecture 2.1 (Covering Conjecture) Let K be a convex body in M" and D 
be an integer cell. Then 

N{K,D) <T.{CKf. (2.2) 
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Our main result is that the Covering Conjecture holds for a body D slightly 
larger that an integer cell, namely for 

1 " 

D = |x G M'^ : -^expexp|x(i)| < sj. (2.3) 

Note that the body 5D contains an integer cell and the body (51oglogn)^^Z) is 
contained in an integer cell. 

Theorem 2.2 Let K be a convex body in M" and D be the body (|2.3() . Then 

N{K,D) < T.{CKf. 

As a useful consequence, the Covering Conjecture holds for D being an ellipsoid. 
This will follow by a standard factorization technique for the absolutely summing 
operators. 

Corollary 2.3 Let K be a convex body in and D be an ellipsoid in M" that 
contains an integer cell. Then 

N{K,D) < T,{CKf. 

As for the Covering Conjecture itself, it holds under the assumption that the 
covering number is exponentially large in n. Say, assume N{K,D) > exp(an), 
where D is an integer cell. Then 

N{K,D) <^{CK)^, where M~log°-°°^(2/a). (2.4) 

The exponent 0.001 can be replaced by arbitrarily small positive number. This result 
also follows from Theorem 12.21 

The usefulness of Theorem 12. 21 is understood through a relation between the cell 
content and the combinatorial dimension. Let -F be a class of real valued functions 
on a finite set il, which we identify with {1, . . . ,n}. Then we can look at F as a 
subset of via the map / (/(0)F=i- For simplicity assume that F is a convex 
set; the general case will not be much more difficult. It is then easy to check that 
the combinatorial dimension v := v{F,l) equals exactly the maximal rank of a 
coordinate projection P in such that PF contains a translate of the unit cube 
P[0, 1]". Then in the sum ()2.1|) for the lattice content S(F), the summands with 
rankP > v vanish. The number of nonzero summands is then at most ^^^=0 (fc) ■ 
Every summand is clearly bounded by vol(P-F), a quantity which can be easily 
estimated if the class F is a priori well bounded. So T,{F) is essentially bounded 
by Ylk=o (fc)' ^^'^ thus controlled by the combinatorial dimension v. This way, 
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Theorem 12.21 or one of its consequences can be used to bound the entropy of F by 
its combinatorial dimension. Say, ()2.4() imphes ()1.4|1 in this way. 

In some cases, n can be removed from the bound on the entropy, thus giving 
an estimate independent of the size of the domain Vt. Arguably the most general 
situation when this happens is when F is bounded in some norm and the entropy is 
computed with respect to a weaker norm. The entropy of the class F with respect 
to a norm of a general function space X on 17 is 

I)(F,X,t) = logsup(ri|3/i,...,/„eF < j U - fj\\x > t) . (2.5) 

Koltchinskii-Pollard entropy is then D{F, t) = sup^ F>[F^ L2(/u), t), where the supre- 
mum is over all probability measures supported by finite sets. With the geometric 
representation as above, 

D{F, X, t) = log A^pack (f, ^Ball(A)) (2.6) 

where Ball(X) denotes the unit ball of X and A^pack(^i-B) is the packing number, 
which is the maximal number of disjoint translates of a set B C M" by vectors from a 
set A C M". The packing and the covering numbers are easily seen to be equivalent, 

iVpack(A, B) < N{A, B) < iVpack(^, ^B). (2.7) 

To estimate D{F, X,t), we have to be able to quantitatively compare the norms 
in the function space X an in another function space Y where F is known to be 
bounded. We shall consider Lorentz spaces, for which such a comparison is especially 
transparent. The Lorentz space = A^{0,,fj,) is determined by its generating 
function (/>(t), which is a real convex function on [0, oo), with (j){0) = 0, and increasing 
to infinity. Then A,^ is the space of functions / on such that there exists a A > 
for which 

MI//AI > i} < -jlT for all t > 0. (2.8) 

The norm of / in A^ is the infimum of A > satisfying 1)2. 8|) . Given two Lorentz 
spaces A^ and A^, we look at their comparison function 

(0|^)(t) =sup{0(s) I ^is)>i^its)}. 

Under the normalization assumption 4>{1) = V'(l) = 1 &rid a mild regularity 
assumption on <j) we prove the following. If a class F is 1-bounded in A^ then for 
aU < t < 1/2 

D{F,A^,t) < Cz;(F,ct) •log(</)|V)(t/2). (2.9) 
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An important point here is that the entropy is independent of the size of the domain 
0,. To prove H2.9() . we first perform a probabihstic selection, which reduces the size 
of Q, and then apply Theorem 12.21 in which we replace -D by a larger set Ball(A0). 

Of particular interest are the generating functions (p{t) = and ■0(0 = f with 
1 < p < q < oo. They define the weak Lp and Lg spaces respectively. Their 
comparison function is {(t)\ip){t) = fP'^/iP-i). Then passing to usual Lp spaces (which 
is not difficult) one obtains from (|2.9)) the following. If F is 1-bounded in Lq{^) then 
for all < t < 1/2 

D{F, Lp{fi),t) < Cp,g v{F, cpj) ■ log(l/t), (2.10) 

where Cp^g and Cp^g > depend only on p and q. 

First estimates of type (|2.10() go back to the influental works of Vapnik and 
Chervonenkis. In the main combinatorial lemma of |VC 81j . the volume of uni- 
formly bounded convex class was estimated via a quantity somewhat weaker than 
the combinatorial dimension. Since we always have N{K,D) > vol{K)/vol{D), the 
Vapnik-Chervonenkis bound is an asymptotically weaker form of (|2.1()|) for p = 2 
(say) and g = oo. Talagrand jT87HT02| proved (ITTni) for p = 2, q = oo up to a 
factor of log''^(l/t) in the right side and under minimal regularity (essentially under 
((Ol)). Based on the method of N.Alon et al. from A.BCH , Bartlett and Long |BLj 
proved l|2.10j) for p = l,q = oo with an additional factor of log(|r2|/ft) in the right 
side, where v = v{F,ct). The ratio \^\/v was removed from this factor by Bartlett, 
Kulkarni and Posner IBKP|, thus yielding i|2.10j) with log^(l/t) for p = l,q = oo. 
The optimal estimate 1)2. lOp for all p and for q = oo was proved by Mendelson and 
the second author as the main result of |MV O.Sj . Finally, the present paper proves 
for all p and q. 

Finally, Theorems 11.21 and 11.31 are proved by iterating 1)2. lUj) with 2p = q ^ oo 
to get rid of both the logarithmic factor and any boundedness assumptions. 

3 Covering by the Tower 

Fix a probability space {0,, fj,). As most of our problems have a discrete nature, they 
essentially reduce by approximation to Q finite and fi the uniform measure. The 
core difficulties arise already in this finite setting, although it took some time to 
fully realize this (see |T 96j ). This way we shall totally ignore measurability issues. 

Tower Our main covering result works for a body in M" which is log log n apart 
from the unit cube, while for the cube itself it remains an open problem. This body 
is the unit ball of the Lorentz space with generating function of the order . For 
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an extra flexibility, we shall allow a parameter a > 2, generally a large number. The 
Lorentz space generated by the function 

e{t) = 9^{t) = e"*-°, t > 1 

is called space the tower space and its unit ball is called the tower. Since ^(1) = 1, 
it does not matter how we define 9{t) for < t < 1 as long as 9(0) = and 9 is 
convex; say, 9{t) = t will work. 

In the discrete setting, we look at being {1, . . . , n} with the uniform probability 
measure fi on The tower space can be realized on R" by identifying a function on 
with a point in R" via the map / i-^ ifi)^=i- The tower is then a convex symmetric 
body in R", and we denote it by Tower". This body is equivalently described by 

ci{a)D C Tower" C C2(a)L> 
where positive ci(q) and C2(a) depend only on a. 

Coordinate convexity We stated our results for convex bodies but not necessar- 
ily convex function classes. Convexity indeed plays very little role in our work and is 
replaced by a much weaker notion of coordinate convexity. This notion was originally 
motivated by problems of calculus of variations, partial differential equations and 
probability. The interested reader may consult the paper ^M, and the bibliography 
cited there as an introduction to the subject. 

One can obtain a general convex body in R" by cutting off half-spaces. Similarly, 
a general coordinate convex body in R" is obtained by cutting off octants, that is 
translates of the subsets of R" consisting of points with fixed and nonzero signs of 
the coordinates. The coordinate convex hull of a set K in R", denoted by cconv(Er), 
is the minimal coordinate convex set containing K. In other words, cconv(i^') is 
what remains in R" after removal all octants disjoint from K. Clearly, every convex 
set is coordinate convex; the converse is not true, as shows the example of a cross 
{(x, y) I a; = or y = 0} in R^. 



Example of a coordinate convex body in 
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Covering by the tower Let A be a nonempty set in M". In contrast to what 
happens in classical convexity, a coordinate projection of a coordinate convex set 
is not necessarily coordinate convex (a pair of generic points in the plane is an 
example) . Define the cell content of A as 

T,{A) = number of integer cells in cconv(P74) 
p 

where the sum is over all 2" coordinate projections in M", including one 0-dimensional 
projection, for which the summand is set to be 1. In many applications A will be a 
convex body, in which case cconv(PA) = PA. The following is the main result of 
this section. 

Theorem 3.1 For every set F in M" and a > 2, 

iV(F, Tower") < S(CF)" 
where C is an absolute constant. 

It is plausible that the Tower" can be replaced by the unit cube, with a replaced by 
an absolute constant in the right hand side; this is a slightly stronger version of the 
Covering Conjecture for coordinate convex sets. 

The proof of Theorem 13.11 which is a development upon |MV 03j , occupies next 
few subsections. 

Separation on one coordinate Fix a set F in M" which contains more than one 
point. Using (|2.7j) . we can find a finite subset A' C F of cardinality iV(F, Tower") 
such that no pair of points from A' lies in a common translate of ^Tower". Denote 
A = 2A'. Then 

\/x,y e A, X y: \\x - y||Towcr" > 1- 

Thus for a fixed pair x ^ y there exists a t > such that ii{\x — y\ > t} > g^y. 
Since 9{t) < 1 for t < 1, we necessarily have t > 1, hence 

3t>0: ^,{\x - y\ > t} > 

where 

Ooit) = e"*-", t > 0. 

By Chebychev's inequality, 

Ei 0o(k(i)-y(i)|)>l, 
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where Ej is the expectation according to the uniform distribution of the coordinate 
i in {1, . . . ,n}. Let x and y be random points drawn from A independently and 
according to the uniform distribution on A. Then x ^ y with probabihty 1 — > 
^, and taking the expectation with respect to x and y, we obtain 

E.^yEi eo{\x{i) - y{i)\) > ^. 

Changing the order of the expectation, we find a reaUzation of the random coordinate 
i for which 

E^,yeoi\xii)-ym>\- (3.1) 

Fix this reahzation. 

Recall that a median of a real valued random variable ^ is a number M satisfying 
P(C < M) > 1/2 and > M) > 1/2. Unlike the expectation, the median may 
be not uniquely defined. We can replace y{i) in (13. Ij) by a median of x{i) using the 
following standard observation. 

Lemma 3.2 Let (/) be a convex and nondecreasing function on [0, oo). Let X and 
Y be identically distributed random variables. Then 

miE(j){\X - a\) < E0(|X - Y\) < miE(p{2\X - a\). 

a a 

Proof. The first inequality follows from Jensen's inequality with a = EX = EY. 
For the second one, the assumptions on imply through the triangle and Jensen's 
inequalities that for every a 

(^{\X - Y\) < <P{\X -a\ + \Y- a\) < i0(2|X - a\) + ^^{2\Y - a\). 

Taking the expectations on both sides completes the proof. ■ 
Denote by M a median of x{i) over x G A. We conclude that 

E, eoi2\x{i) - M\) > ^. (3.2) 

Lemma 3.3 (Separation Lemma) Let X be a random variable with median M . 
Assume that for every real a 

E{X < a}^/" + ¥{X >a+ 1}^/° < 1. 

Then 

E6q{c\X-M\) < ^. 
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In particular, the conclusion implies that the tower norm of the random variable 
X — M is bounded by an absolute constant. 



Proof. One can assume that M = 0. With the notation p(a) = ¥{X > a}, the 
assumption of the lemma implies that for every a 

(1 - pia)) + (pia + 1))!/" < (1 - p(a))V" + (p(a + 1))^" < l, 

hence 

p(a + 1) < p(a)^/", aGM. 
Applying this estimate successively and using p{0) = 1 — ¥{x < 0) < ^, we obtain 

k 

p{k) < 2~" , fc S N. Then for every real number a > 2 

p{a) < pi[a]) < 2-"'"' < 2-""^' < 2-°"^'. 
Repeating this argument for —X, we conclude that 

P{|X| > a} < 2^-°"^", a>2. 

Then 

P{e"^"" >s}< 2i-('°s^)'^'^ < 2s-'"'^ s > e-'^ 
Integrating by parts and using this tail estimate, we have 

r 

E eo{c\X\) = e-"Ee"'"" < e"" e"'' + / 2s-"'''' ds 



e 



For a fixed c < 1/4, the function h{a,c) decreases as a function of a on [2,oo), and 
/i(2, 0) = + 2e~^ ^ 0.47 < ^. Hence for a suitable choice of the absolute constant 
c> 0, 

h{a,c) < h{2,c) < ^ 

because a >2. This completes the proof. ■ 

Applying the Separation Lemma to the random variable together with 

(|3.2I) . we find an a E M so that 

fi{x{i) < a}i/" + fi{x{i) > a + c}^/° > 1, 
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where /i is the uniform measure on A. Equivalently, for the subsets A- and A^ of 
A defined as 

A_ = {x : x{i) < a}, A+ = {x : x{i) > a + c] (3.3) 

we have 

+ > (3.4) 

Here \A\ denotes the cardinahty of the set A. 

Separating tree This and the next step are versions of corresponding steps of 
|MV 03 j . where they were written in terms of function classes. Continuing the 
process of separation for each A_ and we construct a separating tree of subsets 
of ^. 

A tree of nonempty subsets of a set yl is a finite cohection T of subsets of A such 
that every two elements in T are either disjoint or one contains the other. A son of 
an element S G T is a maximal (with respect to inclusion) proper subset of B which 
belongs to T. An element of with no sons is called a leaf, an element which is not a 
son of any other element is called a root. 

Definition 3.4 Let A he a class of functions on and i > 0. ^ t-separating tree 
T oi A is a tree of subsets of A whose only root is A and such that every element 
B which is not a leaf has exactly two sons Bj^ and B^ and, for some coordinate 
i G i}, 

fii)>gii)+t forallfeB+,geB^. 

If \A^\ > 1, we can repeat the separation on one coordinate for j4_ (note that 
this coordinate may be different from i). The same applies to A^. Continuing this 
process of separation until all the resulting sets are singletons, we arrive at 

Lemma 3.5 Let A C M" be a finite set whose points are 1-separated in Tower" - 
norm. Then there exists a c-separating tree of A with at least \A\^^°^ leaves. 

This separating tree improves in a sense the set A which was already separated. Of 
course, the leaves in this tree are c-separated in the Loo-norm, but the tree also shows 
some pattern in the coordinates on which they are separated. This will be used in 
the next section where we further improve the separation of A by constructing in it 
many copies of a discrete cube (on different subsets of coordinates). 

However note that the assumption on A, that it is separated in the tower norm, 
is stronger than being separated in the Loo-norm. 
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Proof. We proceed by induction on the cardinality of A. The claim is trivially 
true for singletons. Assume that 1^41 > 1 and that the claim holds for all sets of 
cardinality smaller than \A\. By the separation procedure described above, we can 
find two subsets A_ and satisfying and H3.4() . The strict inequality in 

(|3.4I) implies that the cardinalities of both sets is strictly smaller than \A\. By the 
induction hypothesis, both A_ and A^ have c-separating trees T_ and with at 
least l^d-l^/" and leaves respectively. 

Now glue the trees T_ and into one tree T of subsets of A by declaring A the 
root of T and A_ and A+ the sons of A. By f{i) > g{i) + c for all / G A+, 

g G yl_ . Therefore T is a c-separating tree of A. The number of leaves in T is the sum 
of the number of leaves of T_ and r_|_, which is at least + > |j4|^/" 

by H3.4() . This proves the lemma. ■ 

Coordinate convexity and counting cells Recall that |^| = N{F, Tower"). We 
shall prove the following fact which, together with Lemma ESI finishes the proof. 

Lemma 3.6 Let A he a set in M", and T he a 2-separating tree of A. Then 

Numher of leaves in T < T,{A). 

The value 2 is exact here. For example, the open cube A = (—1, 1)" has S(j4) = 
1, because A contains no integer cells. However, for every e > one easily constructs 
a (2 — e)-separating tree of A with 2" leaves. 

We ask what it means for a cell to be contained in the coordinate convex hull 
of a set. A cell C in defines 2" octants in a natural way. Let 9 G {—1, 1}" be a 
choice of signs. A closed octant with the vertex z G M" is the set 

Og{z) = {x = (xi, . . . x„) G I {xi -Zi)-ei>0 for i = 1, . . . n}. 

The octants generated by a cell are those who have only one common point with it 
(a vertex) . 

Lemma 3.7 Let A he a set in and C he a cell ofll^. Then C C cconv(yl) if and 
only if A intersects all the octants generated hy C. 

The proof is straightforward and we omit it. ■ 

Proof of Lemma 13. 6L It will suffice to prove that 

if A_ and are the sons of A, then T,{A_) + T.{A+) < S(A). (3.5) 
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Indeed, assuming that p.Sfl one can complete the proof by induction on the car- 
dinaUty of A as follows. The lemma is trivially true for singletons. Assume that 
1^1 > 1 and that the lemma holds for all sets of cardinality smaller than |^|. Let 
A- and A^ be the sons of A. Define T_ to be the colection of sets from T that 
belong to A_; then T_ is a separating tree of A_. Do similarly for T+. Since both 
A- and A^ have cardinalities smaller than \A\, the induction hypothesis applies to 
them. Hence by (|^-i.5|) we have 

'^{A) > S(j4_) + > (number of leaves in r_) + (number of leaves in r_^) 

= number of leaves in T. 

This proves the lemma, so the only remaining thing is to prove (|3.5|) . 

In the proof of ()3.5() . when it creates no confusion, we will denote by S(j4) not 
only the cardinality, but also the set of all pairs {P,C) for which C C cconv(Pj4). 
For this to be consistent, we introduce a 0-dimensional cell 0, and always assume 
that the 0-dimensional projection along with the empty cell are in T,(A) provided A 
is nonempty. 

Clearly, S(A_) U T,{A^) C Ti{A). To complete the proof, it will be enough to 
construct an injective mapping $ from T,(A-)r\T,{A+) into T,{A)\{T,{A-)UTi{A^)). 
We will do this by gluing identical cells from S(y4._) n S(y4_) into a larger cell; this 
idea goes back to |AB(yHj . 

Fix a pair {P,C) G E(A_) n S(^+). Without loss of generality, we may assume 
that A- and A^ are 2-separated on the first coordinate. Then there exists an integer 
a such that 

x{l) < a for X G yl_, a;(l) > a + 1 for x G A+. (3.6) 

The coordinate projection P must annihilate the first coordinate, otherwise (|3.6|) 
would imply that the sets PA- and PA^ are disjoint, which would contradict to 
our assumption that their coordinate convex hulls both contain the cell C. 

Trivial case: rankP = 0. In this case, let P' be the coordinate projection that 
annihilates all the coordinates except the first. Since both A^ and A-^- are nonempty, 
P'A contains points for which x(l) < a and x(l) > a+1. Hence cconv(P'^) contains 
the one-dimensional cell C = [a,a + 1]. So, we can define the action of <I> on the 
trivial pair as $ : (P,0) ^ {P',C'). 

Nontrivial case: rankP > 0. Without loss of generality we may assume that 
P retains the coordinates {2, 3, . . . ,k} with some 2 < k < n, and annihilates the 
others. Let P' be the coordinate projection onto M'^, so C = [a, a + 1] x C is a cell 
in R'^. We claim that {P',C') G S(A). By the assumption, the cell C lies in both 
cconv(Pj4_) and cconv{PA^) . In light of Lemma l3.7l PA^ and PA^ each intersect 
all the octants generated by C, and we need to show that PA' intersects any octant 
O' generated by C. This octant must be of the form either C = {x G M'^ : x(l) < 
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a,Px G O} or O' = {x e : x(l) > a + l,Px G O}, where O is some octant 
generated by the cell C. Assume the second option holds. Pick a point z £ such 
that Pz E PA+ n O. Then P'z{l) = z{l) > a + 1, so P'z G P'A+ n C. A similar 
argument (with A-) works if O is of the first form. This proves the claim, and we 
again define the action of ^> as <I> : (-P, C) i-^ {P',C'). 



To check that the range of ^ is disjoint from both and assume 

that the pair {P',C') constructed above is in Y^(A-). This means that C lies in 
cconv(Q^_) for some coordinate projection Q. This projection must retain the 
first coordinate because the cell C is non-degenerating on the first coordinate by 
its construction. Therefore, since < a for all x £ A^, the same must hold for 
all X € Q{A-), and hence also for all x G cconv(Q^_). On the other hand, there 
clearly exist points in C with = a + 1 > a. Hence C can not lie in cconv(Q^_). 
A similar argument works for Therefore the range of $ is as claimed. 

Finally, $ is trivially injective because the map C ^ C is injective. ■ 

Theorem 13. II follows from Lemma 13.51 and Lemma 13.61 

Remark. The proof does not use the fact that the probability measure on J7 = 
{1, . . . ,n}, underlying the tower space, is uniform. In fact. Theorem 13.11 holds for 
any probability measure on {1, . . . , n}. This will help us in next section. 

4 Covering by Ellipsoids and Cubes 

The Covering Conjecture holds if we cover by ellipsoids containing the unit cube 
rather by the unit cube itself. This nontrivial fact is a consequence of Theorem 13. II 




a 



a + 1 



Nontrivial case: Gluing two copies of C into a larger cell C 
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Theorem 4.1 Let A be a set in M" and D be an ellipsoid containing the cube [0, 1]". 
Then 

N{A,D) < S(C7A)2 
where C is an absolute constant. 

This result wih be used in Section [3 to find nice sections of convex bodies. 

Proof. Translating the ellipsoid D, we can assume that 2D contains the cube 
[— 1, 1]", which is the unit ball of the space Z^. Call X the normed space (M", ||-||2£))- 
Then X is isometric to Let T : l"^ ^ X he the formal identity map and 
S : X ^ I2 he an isometry. Finally, define u = ST ■ ^ I2 and note that ||ii|| < 1. 
Recall that every linear operator it : ^ ^2 is 2-summing and its 2-summing norm 
7r2(u) satisfies tt2{u) < tx 12 ||u||, see |TJj Corollary 10.10. Thus -7r2(u) < yT^/2. 
By Pietsch's factorization theorem (see jTJj Theorem 9.3) there exists a probability 
measure on $7 = {1, . . . , n} such that for all x G M" 



\ux\\ 



< \\x\\L2{n,t,)- 

Since = ||S'^"^ux||x = H^^a^Hx = II^^Hx, we have 

1 



VV2 



x\\x < \\x\\L2{n,^,)■ (4-1) 



On the other hand, the norm of the Lorentz space generated by 62{t) = e}^ ^ clearly 
dominates the L2 norm: for every x £ M", 

\\x\\L2{Q,t^) ^ C'lkllAsjCn,/.) (4.2) 

where C is an absolute constant. Denoting by Tower^(/i) the unit ball of the norm 
in the right hand side of (|4.2() . we conclude from 1)4.11) and 1)4. 2() that 

Tower2(^) C C'D 

where C is an absolute constant. Then by Theorem 13.11 and the remark after its 
proof, 

N{A,D) < N{C'A,Tower^{fi)) < J:{C"Af 

where C" is an absolute constant. ■ 

Next theorem is a partial positive solution to the Covering Conjecture itself. We 
prove the conjecture with a mildly growing exponent. 
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Theorem 4.2 Let A be a set in and e > 0. Then for the integer cell Q = [0, 1]" 

N{A,Q) < E(C£-M)^ 

with M = 4 log^(e + n/ log N{A, Q)), and where C is an absolute constant. 

In particular, this proves the Covering Conjecture in case when the covering 
number is exponential in n: if N{A,Q) > exp(An), A < 1/2, then M < Clog^(l/A). 

For the proof of the theorem, we first cover A by towers, and then towers by 
cubes. Formally, 

N{A,Q) < 7V(^,eTowcr") Ar(eTowcr", Q) 

= N{e-'^A, Tower") Ar(Tower", e^^Q). (4.3) 

Lemma 4.3 For every t > 4, 

iV(Tower",iQ) < exp(Ce-^"*^'n) 
where C is an absolute constant. 

Proof. Wc count the integer points in the tower. For x G R", define a point 
x' e by x'{i) = sign{x{i))[x{i)]. Every point x € Tower" is covered by the cube 

x' + [-1,1]", so 

N = Ar(Tower",iQ) = iV(2t-^Towcr", 2Q) < \{x' G | a; G 2r^Tower"}| 
< |2t"^Tower"nZ'"|. 
For every x G 2r^ Tower" n Z", 

|{^: |a:(i)|=i}|<e-"*^'''+"n=:A;,-, j G N. 
Let J be the largest number j such that kj > 1. Then 

iv<n(;)2'., 

j=i ^ 

as for every j there are at most (^) ways to choose the the level set {i : \x{i)\ = j}, 
and at most 2^^ ways to choose signs oi x{i). 

Let 13 j = kj/n. Since a > 2 and i > 2, /3j < 1/4. Then (jj*.) < [e/^jf'"' < 

ex.p{C n) . Hence 

J 

N < exp (Ci^py^n^ < exp(C2/3j/^n) < exp(C2e-2"*^'n). 
i=i 

This completes the proof. ■ 
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Proof of Theorem 14.21 We can assume that < e < c where c > is any 
absolute constant. We estimate the second factor in (|4.3() by Lemma 14.31 With 
a = M/2, 



C[e 



n 



logN{A,Q) 

■^^_2i/2£/4+i 



2I/2-74 



n 



log iv(A,g) 



A^(Tower",e-^Q) < exp 

< exp 

< N{A,Q)^/^. 
Then (|4.3jl and Theorem 13. II imply that 

N{A,Q) < N{e-^A,Tower^^/^f < ^{ce-^A)^^ 
The proof is complete. 



n 



Theorem l4.2l applies to a combinatorial problem studied by N. Alon et al. |AB('Hj . 

Theorem 4.4 Let F be a class of functions on an n-point set 17 with the uniform 
probability measure /x. Assume F is 1-bounded in Li{i},^). Then for < e < 1 and 
forO <t < 1/2 

Doo{F, t) < Cv \og{n/vt) ■ \og%2n/v) (4.4) 

where v = v{F, cet). 

N.Alon et al. |ABCHj proved under a somewhat stronger assumption [F is 
1-bounded in Loo) that 

Doc{F,t) < Cv log{n/vt) ■ log(n/t^), where v = v{F,ct). (4.5) 

Thus Doo{F,t) = 0{log^n). It was asked in |AB(]Hj whether the exponent 2 can be 
reduced to some constant between 1 and 2. Theorem 14.41 answers this in positive. 
It remains open whether the exponent can be made 1. A partial case of Theorem 
14.41 for £ = 2 and for uniformly bounded classes, was proved in |MV 02j . 

It is important that, unlike in (|4.5|) . the size of the domain n appears in (|4.4|) 
always in the ratio n/v. Assume, for example, that one knows a priori that the 
entropy is large: for some constant < a < 1/2 

Doo{F,t) > an. 



21 



Then by H4.4|) we have an < Cv \og{n/vt) ■ log'^(2n/f). Dividing by n and solving 
for n/v, we get 



n/v < — 



a 

and putting this back into H4.4|) we obtain 

Doo{F, t) < Cv log (1) • log^ (- log y 
\at/ \a t 

We see that n, the size of the domain 0,, disappeared from the entropy estimate. 
Such domain-free bounds, to which we shall return in the next section, are possible 
only because n enters into the entropy estimate (|4.4() in a ratio n/v. 

To prove Theorem 14.41 we identify the n-point domain 0, with {1, . . . ,n} and 
realize the class of functions F as a subset of via the map / i— > {f{i))f^i. The 
geometric meaning of the combinatorial dimension of F is then the following. 

Definition 4.5 The combinatorial dimension v{A) of a set A in R" is the maximal 
rank of a coordinate projection P in R" so that cconv(P^) contains an integer cell. 

This agrees with the classical Vapnik-Chernovenkis definition for sets A C {0, l}", 
for which v{A) is defined as the maximal rank of a coordinate projection P such 
that PA = P({0,1}"). 

Lemma 4.6 v{F, 1) = v{F), where F is treated as a function class in the left hand 
side and as a subset o/R" in the right hand side. 

Proof. By the definition, v{F,l) is the maximal cardinality of a subset a of 
{l,...,n} which is 1-shattered by F. Being 1-shattered means that there exists 
a point h £ R" such that for every partition o" = (T_ U 0"+ one can find a pont 
f £ F with f{i) < h{i) if i E (T_ and f{i) > h{i) + 1 if i G 0"+. This means exactly 
that P(jF intersects each octant generated by the cell C = /i + [0, 1]*^, where P^ 
denotes the coordinate projection in R" onto R'^. By Lemma 13.71 this means that 
C C cconv(PF). Hence v{F, l) = v{F). ■ 

For a further use, we will prove Theorem l4. 41 under a weaker assumption, namely 
that F is 1-bounded in Lp{fj,) for some < p < oo. When F is realized as a set in 
R", this assumption means that is a subset of the unit ball of L'p, which is 

n 

Ba\l{L^) = e R'^ : ^ \x{i)\P < n}. 

1 

We will apply to F the covering Theorem 14.21 and then estimate as follows. 
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Lemma 4.7 Let A he a subset of a- Ball(Lp) for some a > 1 and < p < oo. Then 

where v = v{A), Ci{p) = C(l + ^) and Csb) = 1 + |- 
Proof. We look at 

Ti{A) = number of integer cells in cconv(PA) 
p 

and notice that by Lemma 14.61 rankP < v{A) = v for all P in this sum. Since the 
number of integer cells in a set is always bounded by its volume, 

S(^) < vol(cconv(P^)) < vol(p(a-Ball(L;j)) 

where the volumes are considered in the corresponding subspaces P(M"). By the 
symmetry of L^, the summands with the same rankP in the last sum are equal. 
Then the sum equals 



l + ^(^)a'volfe(Pfc(Ball(L-)) 
fc=i ^ ^ 



(4.6) 



where P^ denotes the coordinate projection in M" onto M.^. Note that Pfc(Ball(Lp)) = 
(n//c)i/PBall(L^) and recall that vol(Ball(L^)) < Ci{p)'', see fPU (L18). Then the 
volumes in (|4.6|) are bounded by {n/k)^/PCi{pY ^ (Ci(p)n/A;)'^2(p)'=. The binomial 
coefficients in (|4.6p are estimated via Stirling's formula as (^) < {en/k)^ . Then (|4.6I) 
is bounded by 

1 + E (t) © '''^^ ^ (£:^iM^) '^'^'^ . 

k=l ^ 

This completes the proof. ■ 

Proof of Theorem 14. 4L Viewing P as a set in M" , we notice from (|2.6|) and (|2.7|) 

that 

D^F^t) < log N{F,2tQ) < D^{F,t/2) (4.7) 

where Q = [0,1]". Therefore it is enough to estimate = N{F,2tQ). We apply 
successively the covering Theorem 14.21 and Lemma l4. 71 with p = 1: 

,1 , (C \M fCn\CMv 
iV = iV(-F,0)<E(-f) <(-) (4.8) 
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where v = v{-^F) = v{F,^) and M = 41og^(e + n/logN). Define the number 
a > by = exp(an). Then M = 41og^(e + and taking logarithms in (|4.8|) we 
have an < CMv log(^). Dividing by Mn, we obtain 

a Cv ^ fCn\ 
T- < — log . 

log^(e + i) " n ^KevtJ 

This implies 

, fCn\^ ^ fCn , fCn\\ Cv ^ fCn\^ ^ fCn\ 
a< — log — log^ —/log — < —log — logM — 
n \evtJ \ V \£vtJ J n \evtJ \ v J 

and multiplying by n we obtain 

log N <Cv \og{Cn/v£t) ■ \og%Cn/v). (4.9) 

It remains to remove e from the denominator by a routine argument. 
Consider the function 

(f){e) = log^ (Cn/v), where v = v{e) as before. 

As e decreases to zero, v{e) increases, thus (j){e) decreases to 1. Define Eq so that 
0(eo) = e. 

Case 1. Assume that e > eo- Then (j){e) > e, thus e > 1/ log log(Cn/u), so 
Cn/vet < {Cn/vt)"^. Using this in 1)4. 9() we obtain 

logiV < Cv log{Cn/vt) • log%Cn/v). (4.10) 

Case 2. Let e < eo- Then (pie) < e, so by 

log N < Cv{eo) log{Cn/v{eo)eot) • e. (4.11) 

As in case 1, we have Cn/v{eo)eot < {Cn/v{£o)t)'^ . Using this in 1)4. 11() . we obtain 

logA^ < C'v{eo) log{Cn/v{eo)t) < C'vlog{Cn/vt), 

because v{£q) < v{e) = v. In particular, we have (|4.1fl|) also in this case. In view of 
(|4.7() . this completes the proof. ■ 
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5 Covering by balls of Lorentz spaces 



So far we imposed no assumptions on the set A C M" which we covered. If A 
happens to be bounded in some norm || • ||, a new phenomenon occurs. The covering 
numbers of A by balls in any norm slightly weaker than || • || become independent of 
the dimension n; the parameter that essentially controls them is the combinatorial 
dimension of A. 

This phenomenon is best expressed in the functional setting for Lorentz norms 
1)2. 8|) . because they are especially easy to compare. Given two generating functions 
(j) and ip, we look at their comparison function 

(</.|V)(t) =sup{0(s) I </.(s) > 

Fix a probability space The comparison function helps us measure to what 

extent the norm in = A(f,{i}, ji) is weaker than the norm in = A^(i7, //). 
Just for the normalization, we assume that 

<P{1) = ^{l) = I. (5.1) 

Let 2 < a < oo. We rule out the extremal case by assuming that 

0(s) < e"*-" for t > 1. (5.2) 

Theorem 5.1 Let (p and ip he generating functions satisfying (|5.H) and (|5.2|) . Let 
F he a class of functions 1 -bounded in A^. Then for < i < 1/2 

D{F,A^,t) < Ca viF,ct)-log{<P\^P){t/2) 

Remarks. 1. No nontrivial estimate is possible when (p = ip. Indeed, even in the 
simplest case when il. is finite and /U is uniform, let us take F to be the collection of 
the functions f^i = Suj/\\6ui\\a^, ^ £ ^, where 5^; is the function that takes value 1 at 
uj and elsewhere. Clearly, F is 1-bounded in A^ and has combinatorial dimension 
d{F,t) = 1 for any < t < 1. However, \\f^ — fui'\\A^ > 1 for w 7^ uj' . Hence 
D{F,A(j,, 1/2) = log|-F| = log|il|. This can be arbitrarily large. 

2. To see the sharpness of Theorem 15. 11 notice that for some probability measure 
/i on Q, 

D{F,A^,t)>cv{F,Ct). 
A simple argument can be found in jT Q2\ Proposition 1.4. 

In the extremal case of Theorem 15. 11 when F is 1-bounded in L^o, the comparison 
function becomes just (j){t), which gives 
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Corollary 5.2 Let (f) be a generating function satisfying (|5.2() and such that = 
1. Let F be a class of functions 1-bounded in L^o- Then for < t < 1/2 

D{F,A^,t) < Ca v{F,ct) •log(^(t/2). 

We use Theorem 15.11 for classical Lorentz spaces Lp^^o = -^p,oo(f^, /u) generated 
by ct>{t) =tP. 

Corollary 5.3 Let 1 < p < q < oo. Let F be a class of functions 1-bounded in 
Lg^oo- Then for <t <l/2 

D{F,Lp^^,t) < Cp,g v{F,ct) ■ log(l/t) 

where 



Cp^q — C 



q-p 



Proof. We apply Theorem 15.11 to the functions (j){t) = and 'ip{t) = t'^. In this 
case the comparison function becomes {(t)\il^){t) = tP^/lp-q). To complete the proof, 
notice that (|5.2j) holds with a = p. ■ 

Our main interest is in the Lp spaces, for which we obtain 

Corollary 5.4 Let 1 < p < q < oo- Let F be a class of functions 1-bounded in Lq. 
Then forO<t< 1/2 

D{F, Lp, t) < Cp,q v{F, cpj) ■ log(l/cp,gt) (5.3) 

where 

^P'i = ^{f:^)^ Cp,, = cmin|^l,(^^^ y 

In the next section, this estimate will be applied in an important partial case, when 
p is a nontrivial proportion of q. In that case, say if p < 0.99g, inequality (|5.3|1 reads 

D{F, Lp, t) < Cp" v{F, ct) • log(l/t). (5.4) 

The history of estimates obtained prior to Corollary 15.41 and (|5.4|) is outlined in 
Section H after (0TI7|l . 

Proof. Since F is 1-bounded in Lq, it is also 1-bounded in Lg^oo- Let p' be so that 
p <p' < q. Fix an / e Lp/_oo with ||/||p',oo < 1- Then 

< / 

p 



I < I \f{u^rd^Ji + I ptP-'fi{io : |/(u;)| > t}dt 

{u,: |/H|<1} 



<l + p tp-^-pdt < -i— . 

Jl p' -p 
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Taking the p-th root we conclude that if f,g are t-separated in Lp, then they are 
(6pyt)-separated in Lp/ o^, where 

/ \ i/p 



\ pi 



Thus D{F^ Lp, t) < D{F, Lp'^oo, bp,p't)- Then the apphcation of Coroharv 15 . 31 with p' 
and q gives 

D{F,Lp,t) < Bpi^q v{F,hp^pd) ■ \og{l/hp.p,t) 



with 



Bp',q = C 



/2 

p q 



\Q-PJ 

If we choose p' = min(2p, ^-^) then a direct check shows that 

This completes the proof. ■ 

To prove Theorem 15.11 we first reduce the size of the domain (which can be 
assumed finite) by means of a probabilistic selection and then apply the covering 
Theorem 13.11 

In the probabilistic selection, we use a standard independent model. Given a 
finite set / and a parameter < (5 < 1, we consider selectors 5i, i € /, which are 
independent {0, l}-valued random variables with M5i = 5. Then the set J = {i € I : 
6i = 1} is a random subset of / and its average cardinality is s = 5\I\. We call J a 
random set of expected cardinality s. 

Lemma 5.5 Let < e < 1. For t > e6m, 

m 

P|| ^(5i -6) > t} < 2exp(-crf) 
1=1 

where c > is an absolute constant. 

Proof. This follows from Prokhorov-Bennett inequality. Let (Xi) be a finite se- 
quence of real valued independent mean zero random variables such that ||Xj||oo < a 
for every i. If 6^ = Y,-EXf, then for alH > 



p :-- 



^Xi>t|<exp t/a- {t/a + b'^/a'^)log{l + at/b'^) (5.5) 



27 



which is less than ex.p{-t^ /Ab^) if t < b^/2a (see e.g. 6.3). 

We apply Prokhorov-Bennett inequality for Xi = 6i — 5 and with a = 1, 5^ = 5m. 
Consider two cases: 

1) edm < t < 86m. Since in that case t/16 < b'^/2a, we have 

p < p| > t/16} < exp{-t'^/645m) < exp(-et/64) 

i 

because t > e5m. 

2) t > 86m. Then log(l + at/b'^) = log(l + t/6m) > 2, hence 

(t/a)(log(l + at/62) - l)l <exp(-t). 



P < exp 

Thus for all t > £6m we have p < exp(— cet). Repeating the argument for —Xi, 
we conclude the proof. ■ 

Lemma 5.6 There exist absolute constants C, c > for which the following holds. 
Let 7 > and let Q be a system of subsets of {1, ... ,n} such that 

\S\ > 7n for all S G Q. 

If a is a random subset of {l,...,n} of expected cardinality k satisfying \Q\ < 
0.001 • exp(c7/c), then with probability at least 0.99 we have 

l^^^l>0.99^ for alls gQ. 



a n 



Proof. Let < (5 < 1/2 and set (5i, . . . , (5^ to be {0, l}-valued independent random 
variables with EJj = 5 for all i. Let 5 = k/n; consider the random set cr = {i : 5i = 
1}. For any set C {1, . . . , n}, |5 Pi (t| = X]je5 Lemma [531 applied to a sum 

over S instead of {1, . . . ,m} and with t = 0.001(5|Q|, there is an absolute constant 
Co > such that 

F{\S na\< 0.999(5|5|} < 2exp(-co(5|5|). 
Since for every S £ Q, \S\ > 771, then 

k 1 

ISno-l < 0.999- -ISl} < 2exp(-cojk). 
n ) 



Therefore 

n 



jv^e Q, \Sr\a\ > 0.999^ • IS'lj > 1 - 2|Q| exp(-co7/c) > 0.998 



provided c < cq/2. Also, \a\ < 1.001/c with probability at least 0.999 since k can be 
assumed sufficiently large. This completes the proof. ■ 
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Given a fininte set I, we will now work with Lorentz spaces A0(I) = A(^(/, 
where fi is the uniform measure on /. The following two lemmas reduce the size 
of / while keeping both the boundedness of the class F in the A^-norm and the 
separation of F in the A(j<,-norm. 

Lemma 5.7 Let ip be a generating function. Let f he a function on a finite set L 
such that 

< 1. 



If a is a random subset of I of expected cardinality k > C , then with probability at 
least 0.9 we have 

Proof. Let a > 2 be a parameter to be chosen later and let 5, 6j be as in the proof 
of Lemma 15.61 For s G Z define the set 

ls = {jel : \fij)\>r}. 

Since ||/||a^(/) < 1, we have 

\Is\ < (5.6) 



Define also the event As as 



f,^ , aSn ~i 
<^ Is ncr > — — }. 



We want to bound the probability that at least one As occurs. Let r be the maximal 
number such that 5\Ir\ > 0.01. Then 

P{Vi G Ir+i Sj = 0} = (1 - > e-^\^-+'\ > e~°-°^ > 0.99. 

If As occurs for some s > r then Is Da is nonempty, hence the larger set Ir+i H cr is 
nonempty, which happens with probability at most 0.01. Thus 

P( U ^s) < 0.01. (5.7) 



5>r 



aSn 



To bound P(^s) with s < r, we will apply Lemma 1^31 with m = \Is\ and t — 2^(2^) ■ 
Note that 
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By (|5.6|1 . we have ^"2") ~ <^"^ >t> 5m. Then Lemma 1^31 gives 

caJn 



P(^) < exp ( 



By the convexity of ij), 

ip{wx) > wiplx) for all X > and w > 1. (5-8) 
Thus 'iIj{2'') < 2''-'''iIj{T'). Then using and the fact that 5\Ir\ > 0.01, we obtain 

¥{As) < exp f - 2^-^^^) < exp(-2"-^ca5|/,|) < exp(-0.01ca2'^-^). 

So if a is taken large enough then X]s=-oo ^i^s) < 0.01. Combining this with ()5.7p . 
we conclude that 

f( U ^) < 0.02. 

In addition, by Lemma 15.51 we have IP{|cr| < ^Sn} < 0.02 since k can be assumed 
large enough. 

Now suppose that \a\ > ^6n and that none of the events Ag occur, which happens 
with probability at least 1 — 0.02 — 0.02 = 0.96. Fix any t > and find an integer s 
so that 2'' < t < 2*+^ By the definitions of Ag, Is and by (|^ . 

ir. i^/.M 11 IT I o.6n 2a\a\ 2a\a\ \a\ 
\{iGa: \f{i)\ > t}\ < \Is na\< — — < — i-i < — ^ < 



ip{2') - 11^(2') - ip{t/2) - ij{t/4a) 



This means that ||/||A^{a) < 4a. 



Lemma 5.8 Let (p, ip be Lorentz functions. Let F be a class of functions on a finite 
set I, which is 1-bounded in the A^(/) norm. Assume that 

||a;||A^(/) > t for all x £ F. (5.9) 

If a is a random subset of I of expected cardinality k satisfying \F\ < 0.001 exp ( (,j,\'^)(t) ) ' 
then with probability at least 0.99 we have 

\\x\\A.{a) > 0.99||2;||a (/) for all x e F. 
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This lemma will be applied to the difference set A — A of a t-net A of the class F in 
the theorem. 



Proof. We can assume that / = {!,... ,n}. Fix an x £ F. Since |a^|/||2;||A^(7) = 1, 
there exists an s = s{x) > such that 

On the other hand, since ||x||a^(/) > t and < 1, the measure in (|5.10() is 

majorized by 

^^{\x\ > ts} < 
Hence (/>(s) > ip{ts) and therefore 

4>{s) < {4>mt)- (5-11) 

Now consider the family of subsets of / defined as 

= |i : > xeF. 

By H5.1U() and H5.11() . for every x G F 



1^(^)1 > T7^ > 



Let Ha denote the uniform probability measure on a. Lemma 15.61 implies that 
whenever \F\ < 0.001 exp ( ((f,\^)(t) ) ' ^ random subset o" of I of average cardinality k 
satisfies with probability at least 0.99 that 



r Ixl , , 1 \S(x) n a\ 

TT-^ > S{x) \ = > 0.99 

^\\x\\\,(T\ J k 



\S(x)na\ \S(x)\ 
> 0.99 ' ^ ^' 

\x\\A^ii) kl n 

1 1 
> 0.99——-- > , , , for ah x e F. 

(/>(s(x) - ^s{x) 0.99) 



Hence 

IkllA^(^) > 0.99||x||a_^(/) for all x e F. 
The proof is complete. 
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Proof of Theorem 15.11 We may assume that is finite. By sphtting the atoms 
of $7 (by replacing an atom uj by, say, two atoms uji and i02, each carrying measure 
^n{co) and by defining f{iOi) = f{u>2) = f{^) for / E F), we can make the measure ^ 
almost uniform without changing neither the covering numbers nor the combinatorial 
dimension of F. So, we can assume that fj, is the uniform measure on 0,. 

Let ^ be a t-separated subset of F (which means that ||/ — 5'||A^(r2) > t for all 
f ^ g in F) oi size 

log\A\=D{F,A^{n),t). 

The difference set ^{A - A) \ {0} = {i(/ - g) : f g; f,g e A} satisfies the 
assumptions of Lemma 15.81 with t/2 in (|5.9|) . Then for k defined by 

a random subset a of i7 of average cardinality k satisfies with probability at least 
0.99 that 

- 5)IIa,W ^ 0-99|l^(/ - 9)h,m > 0-99^ > I fo^- / ^ 5 in A. 
This means that 

A is --separated in Aip{a) (5.13) 

and in particular 

D{F,A^ia),t/3) >log\A\ = D{F,A^{n),t). 

The advantage of the left hand side is that the size of a is controlled via H5.12() . 

We need also to keep A well bounded in A^{a). Denote by Eg- the average over 
the random set a, that is over the selectors 5i. By Lemma 15.71 

E l{/ G ^ : II/IIa^M < C}\ = ^P(||/|U,(.) < C) > 0.9\A\. 

Therefore with probability at least 0.8, 

at least a half of the functions in A have norm ^ C- (5-14) 

Since k/2 < \a\ < 2k holds with probability at least 0.9, there exists a realization 
of a that satisfies simultaneously this property, 1)5. 13() and 1)5. 14() . Let B be the set 
consisting of |/, where / are the functions satisfying (|5.14() . 

Summarizing, there exists a subset o" of and a set B such that 
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• B is a subset of jA, 

• B is (C/t)-bounded in L^{a), 

• B is 2-separated in A^{a), 

. 151 > 1^1/2 >c'exp(,(4Hi_). 

We can clearly assume that a = {1, . . . ,n} and realize the space = A(^({1, . . . , n}) 
as M" equipped with the Lorentz norm A^. Applying the covering Theorem 13.11 we 
get 

iV(S, Tower") < (5.15) 

Since B is 2-separated in A^ and by 1)5. 2|) the norm in this space is bounded by the 
Tower" norm, the set B is also 2-separated in the Tower" norm. Hence 

iV(S, Tower") = (5.16) 

The right hand side of (|5.15|) can be estimated through Lemma 14.71 By (|5.1|) and 

convexity, ^p{t) > t for t > 1. Thus C||/||l^ > ||/||li/2 functions /. Hence 

B C (C7/t)Ban(L;) C {C' /t)Ball{L'^/^). 
Hence by (f^?T^ . and Lemma US 

— j , w\ievev = v{B) = v{B,l)<v{A,t/&). (5.17) 

We also have a lower bound \B\ > c' exp(an) with a = ^^j^^^^^y^) • Taking logarithms 
of the upper and the lower bounds, we obtain an/v < Clog{Cn/tv), from which it 
follows that 

- < - log - 



V a \ta 
Plugging this back into 1)5. 17() . we obtain 



\B\ < 

Note that 



to) ^ (ta 



taJ 



(0|V')(i) > lA for < t < 1. 
Indeed, (/>(i) > 1 = ^(t • 1) which implies {4>\i^){t) > > f 
Therefore t > (2/c)a and finally 

\B\ < a-^^' < {<j)\^){t/2fr 

On the other hand, by the construction 

log|i?| >log(^|A|) >cD{F,A4n),t). 
This completes the proof. 
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6 Random processes and the uniform entropy 



Here we prove our main results, Theorems 11.21 and 11.31 which compare the uni- 
form entropy D{F,t) to the combinatorial dimension v{F,t). One direction of this 
comparison is easy: for every class of functions F and every t > 0, 

D{F,t) >cv{F,2t) (6.1) 

where c > is an absolute constant, see |T 02j . 

The reverse inequality is not true in general even for {0, 1} classes. Let, for 
example, F be the collection of n characteristic functions Ijj} of the singletons 
i G {1, . . . , n}. Then for < t < n^^/^ we have D{F, t) = logn while v{F, t) = 1. 

Nevertheless, we are able to show that the reverse to (|6.1|) holds: 1) under a 
minimal regularity of F, and 2) always after taking integrals on both sides. 



Integral equivalence The following is a general form of Theorem 11.21 
Theorem 6.1 For every class F and for any b > 0, 

/ ^/D{F,t) dt<C ^Jv{F,t) dt. (6.2) 

Jb Jcb 

The proof of Theorem 16.11 is based on the following 
Lemma 6.2 Let a > 2 and let F be a function class. Then for all t > 0.' 

oo 

D{F,t) < C log 4^ v{F,caH). (6.3) 
i=o 

Proof. The proof of the Lemma uses an iteration argument. It relies on the fact 
valid for arbitrary sets K, D and L in M": 

N{K, D) < N{K, L) sup N{{K + z) n L, D). (6.4) 

To check this, first cover K by translates of L and then cover the intersection of K 
with each translate by appropriate translates of D. 

It will be easier to work with the "covering" analog of D{F,t), so we define a 
covering version of D{F,X,t) in 1)2. 5(1 as 



D'{F, X, t) = log sup (^n I 3/i, . . . , /„ E X V/ E F 3i 11/ - fi\\x < t) . 
By (EH), 

D{F,X,2t) < D'{F,X,t) < D{F,X,t). (6.5) 



34 



We can clearly assume the domain il. to be finite. Fix the underlying probability 
/i on J7 and t > 0. For j = 1, 2, . . . define 

tj = a^~^t and Xj = L2j(ri,/^). 

We estimate D{F, L2{^, fj.),t) by (^31): 

D'{F, Xuti)< D'{F, X2,t2) + sup D'{{F + h)n t2BaU(X2), Xi, ti) 

h 

where the supremum is over all functions h on the (finite) domain 0. Iterating this 
inequality, we obtain 

oo 

D'{F,Xi,ti) < Y,svipD'{F,{h),Xj,t,) (6.6) 

j=i 

where Fj{h) = (F + /i) n tj+iBall(Xj+i). Obviously, the class Fj{h) is tj+i-bounded 
in Xj+i. Then applying 1)5. 4() to the class tJ^-^^Fj{h) with p = V and q = 2-'+^, we 
obtain 

D'{Fj{h),Xj,tj) < C4:^v{Fj{h),ctj) • log{tj+i/tj) < CA^v{F,ctj) • logo 

because apparently v{Fj{h), s) < v{F,s) for every s. To complete the proof we 
substitute the previous inequality into (|6.6|) and use (|6.5j) . ■ 

Proof of Theorem 16. IL Applying Lemma 16.21 and Jensen's inequality, we have 
for a = 3: 

/•oo °° /-oo I 

/ VD{F,t) dt < Cyioga y 2^ / J v{F, caH) dt 
Jb J^o Jb ^ 

^ poo f'OO 

= Cyioga y (2/a)M ^u(F,n) du < C ^/v{F,u) du. 

„_n Jca^b Jcb 



Remark. The square roots in (|6.2)) can of course be replaced by any other equal 
positive powers. The present form of 1)6. 2|) was chosen to match Dudley's entropy 
integral, see Theorem 16.51 below. 

Theorem 11.11 follows from Theorem 16.11 as explained in the introduction. The 
gap between the sufficient and the necessary conditions in Theorem 11.11 is known 
to be needed in general (at least in the description of universal Donsker classes, see 
|l)u 99j Propositions 10.1.8 and 10.1.14). 



35 



Pointwise equivalence A similar argument, which we give now, completes the 
proof of the other main result of the paper, Theorem ll.31 Although H6.1|) can not be 
reversed in general, a remarkable fact is that it can be reversed if the combinatorial 
dimension is polynomial in t. 

Theorem 6.3 Let F be a class of functions and t > 0. Assume that there exist 
positive numbers v and a < 1 such that 

v{F,tx) < TJX"" for all x>l. (6.7) 

Then 

D{F,Ct) < {C/a)v. 

Proof. Applying Lemma 16.21 with a = 5^/° and estimating the combinatorial di- 
mension via (|6.7|) . we have 

oo 

v{F,t/c) < Cloga^4^va-"^' < {C/a)v. 

j=0 

This completes the proof. ■ 

The following is a general form of Theorem 11.31 It improves upon Talagrand's 
inequality proved in |T 87j . see |T 02j . 

Corollary 6.4 Let F be a class of functions and t > 0. Assume that there exists a 
decreasing function v{t) and a number a > 2 such that 

1 

v{F,s)<v{s) and v{as)<-v{s) foralls>t. (6.8) 

Then 

D{F,Ct) < Cloga-v{t). 

Proof. Applying ()6.8|) recursively, we have v{a^ t) < jjv{t) for ah j = 0,1,2,.... 
Let X > 1 and choose j so that < x < a^~^^. Then 

log 2 . , 

>x 1^ > 2~^^\ 

so, 

log 2 

v{F,tx) < v{tx) < v{ta^) < Cv{t) ■ 2"^ < 2C v{t) ■ x 
The conclusion follows by Theorem 16.31 ■ 
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A combinatorial bound on Gaussian processes A quantitative version of 
Theorem 1 1.1 1 is the following bound on Gaussian processes indexed by F in terms of 
the combinatorial dimension of F. 

Let -F be a class of functions on an n-point set /. The standard Gaussian process 
indexed by / E F is 

where (gi) are independent N{0, 1) random variables. The problem is to bound 
the supremum of the process (Aj) normalized by the standard deviation as in the 
Central Limit Theorem: 

E{F) = n"^/2 EsupA/. 
Theorem 6.5 For every class F , 

/•oo 

E{F) <C \/v{F,t) dt (6.9) 

where C is an absolute constant. Moreover, can be replaced by cn^^^'^E(F), where 
c> is an absolute constant. 

Proof. By Dudley's entropy integral inequality, 

/•oo 

E{F)<C y/D{F,L2{n),t) dt (6.10) 

Jn-i/2£;(F) 

where ^ is the uniform probability measure on /, see |MV 03j . Then the proof is 
completed by Theorem 16. II ■ 

In 1992, M. Talagrand proved Theorem 16 . 51 for uniformly bounded convex classes 
and up to an additional factor of log*^(l/t) in the integrand; this was a main result 
of |T 92j . The absolute constant M was reduced to 1/2 in |MV 03] . Theorem 16 .51 is 
optimal. We emphasize its important meaning: 

In the classical Dudley's entropy integral, the entropy can be replaced by 
the combinatorial dimension. 

Optimality of the bound on Gaussian processes We conclude this section 
by showing the sharpness of Theorem 16.51 For every n one easily finds a class 
F for which the inequality in (|6.5|) can be reversed ~ this is true e.g. for F = 
{ — 1,1}^. More importantly, the integral in 1)6. 5|) can not be improved in general 
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to the (Sudakov-type) supremum sup tyJv{F,t). This is so even if we replace the 
Gaussian process Xf by the Rademacher process 

where (sj) are independent symmetric ±1 valued random variables. The average 
supremum of such process, 

f<^F 

is well known to be majorized by that of the Gaussian process: Erad{F) ^ CE{F) 
(see |LT) Lemma 4.5). 

Proposition 6.6 For every n, there exists a class F of functions on {l,...,n} 
uniformly bounded by 1 and such that 

POO 

Erad{F)>ci Vv{F,t) dt > clogn-supV«(F,t). (6.11) 
Jo t>o 

Our example will be constructed as sums of random vertices of the discrete cube 
with quickly decreasing weights. 

We shall bound E^adiE) from below via a Sudakov type minoration for Rademacher 
processes. Let D = Ball(L2) = y/nBJ^ - Proposition 4.13 of |LTj with e = states 
the following: 

Fact 6.7 If A is a subset o/M" and 

sup ||x||oo < ci (6.12) 

xeA J^radK^) 

then 

^logN{A,^D) < CErad (A). (6.13) 

The entropy in (|6.13() will be estimated in a standard way: 

Fact 6.8 There exists an absolute constant a such that the following holds. Let A 
be a set of N < e"" random vertices of the discrete cube {—1,1}"', i.e. A consists 
of N independent copies of a random vector (ei, . . . ,£n)- Then with probability at 
least 1/2, 

N{A, ^D) > Vn. 
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Proof. Obviously, we can assume that N > 2. Assume that the event 

N{A,^D)<Vn (6.14) 

occurs. Then there exists a translate D'^ = + x of which contains at least 
N/N{A, ^D) > VN points from A. Set A' = An D'^. By dividing the set A' into 

pairs in an arbitrary way, we can find a set V oi M > pairs (x, y) G A' x A' , 

X ^ y, so that each point from A belongs to at most one pair in V. Since A' lies in 
a single translate of ^D, we have 

Ik ~ vWl^ ^ 1 for all (x, y) £ V. (6.15) 

Thus if ()6.14|) occurs, then ()6.15|) occurs for some M-element set V C A x A. 
Let now V he a fixed set of M disjoint pairs of elements of A. Then 

P(event (|6T5ll occurs) = (P(||x - yU^ < 1))^^ (6.16) 

where x and y are independent random vertices of the discrete cube. Here we used 
the fact that the pairs in V are disjoint from each other and, consequently, are jointly 
independent. The probability in (|6.16)) is easily estimated using Prokhorov-Bennett 
inequality (|5.5|) : 

n 

nik - yh- < 1) = P( \eu - e2i\^ < n) < e'^^i" 

i=l 

where (en) and (e2i) are independent copies of the random vector (ej). 

To estimate the probability that the event (|6.14|) occurs, note that there is less 
than (^j-) ways to choose V. Therefore 

, := r{NiA, Id) < Vn) < (^') (e— )- < (^e"-)" < (iVe--)-^/^ 

Since > 2, we have M > 1. If a < C2/2, we conclude that p < exp(-^ •3M/2) < 
1/2. This completes the proof. ■ 

Corollary 6.9 There exists an absolute constant a such that the following holds. 
Let A be a set of N < e"" random vertices of the discrete cube. Then with probability 
at least 1/2, 
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Proof. Since A C {-1, 1}" C ^/^S^, we have 



Erad{A) < CE{A) < (6.17) 

see |LT) (3.13). To prove the reverse inequality, assume that < e^" for /3 = 
min(a, (ci/Ci)^), where c', ci and Ci are the absolute constants from Fact 16.81 (|6.12|) 
and (|6.17|) respectively. Then (|6.17|) implies that (|6.12|) is satisfied. Hence by Facts 
16.71 and 16.81 with probability at least 1/2 we have 

EradiA) > il/C)^logN{A,^D) > (c/^/2)7b^. 
This completes the proof. ■ 



Proof of Proposition 16. 6L Fix a positive integer n. Let ki be the maximal 
integer so that 2^ ' < e°", where a is an absolute constant from Corollary 16.91 
For each 1 < k < ki define a set F^, in M" as follows. Let N{k) = 2^ and let 
A-k = {xi, . . . , x^^i^ s^} b e a family of points in the discrete cube {—1,1}" in 
satisfying Corollarv 16.91 Set 

ki 

Fk = 2-^-Ak and put F = ^Fk 

k=l 

where the sum is the Minkowski sum: A + B = {a + b : a £ A,b £ B}. Then F is 
a uniformly bounded class of functions on {1, . . . , n}. 
By Corollarv 16.91 we have 

fci fci 

Er{F) = ^Er{Fk) > c^2-VlogA^(A;) > cki >cilogn. (6.18) 

k=l k=l 

To estimate the combinatorial dimension of -F, fix a t = with < A; < A;i. Then 

fe+i ^ 

v{F,t) < ^^(^i^z,2)' 
1=1 

because the diameter of Yl^=k+2'^i ™ norm is at most t/2. Obviously, by 

definition of the combinatorial dimension, for any finite set F we have v{F,t) < 
log2 \F\, so 

k+l k+1 fc+1 

w(F,t) <log2| = J]log2|Fi| =^4' <C4^ (6.19) 

1=1 1=1 1=1 



40 



This shows that ty^v{F,t) < Ci for ah t > 2^''^ Since 2"'=i < 2/^/n and clearly 
ty^ v{F, t) < t^/n < 2 for all t < we conclude that 



supt^/v{F,t) < Ci. (6.20) 



Also, since F is uniformly bounded by 1, we have v{F,t) = for all t > 1. 
Moreover, for all f,g ^ F and all i E {l,...,n}, we have \f{i) — g{i)\ > 2~^^ 
whenever /(i) / g{i). Hence, v{F,t) = v(F,ti) for all t <ti = 2~^^. Thus, 

/•oo ''^i I 

/ < C V2~'=J?;(F,2-fc) < C/fci < Clogn. 

This, and Theorem 16.51 implv that the leftmost and the middle quantities in 

1)6. 11(1 are both equivalent to logn up to an absolute constant factor. Together with 
(|6.2nj) . this completes the proof. ■ 



7 Sections of convex bodies 

First applications of entropy inequalities involving the combinatorial dimension to 
geometric functional analysis are due to M. Talargand |T 92j . Using his entropy 
inequality (which (|6.9|) strengthens) he proved that for Banach spaces infratype 
p implies type p {1 < p < 2). He also proved classical Elton's Theorem with 
asymptotics that fell short from optimal, improving earlier estimates by J.Elton |E| 
and A.Pajor |Paj : the optimal asymptotics were found in |MV fl3j using (|2.1()j) . 

Here we will use new covering results to find nice coordinate sections of a gen- 
eral convex body (for simplicity, we will assume that the body is symmetric with 
respect to the origin). Our main result is related to three classical results in geo- 
metric functional analysis - Dvoretzky Theorem in the form of V. Milman (see |MSj 
4.2), Bourgain-Tzafriri's Principle of the Restricted Invertibility |BT 87j and Elton's 
Theorem (|E], see also |Pi], |T92] . |MV 03] V 

By Bp we denote the unit ball of Ip, that is the set of all x G M"' such that 
\x{i)\''' < 1. Let K he a convex body symmetric with respet to the origin. Its 
average size is measured by Mk — J^n-i \\x\\k da{x), where a is the normalized 
Lebesgue measure on the sphere S^~^ and denotes the Minkowski functional 

of K (the seminorm whose unit ball is K). 

Theorem 7.1 (Dvoretzky Theorem, see [MS]) Let K be a symmetric convex 
body in M" containing i?2 • Then there exists a subspace E in of dimension 
k > cM^n and such that 

c{Bl^nE) <Z MKiKDE) QC{Bl^nE). (7.1) 
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Moreover, a random subspace E taken uniformly in the Grassmanian Gn,k satisfies 
H7.1|) with probability at least 1 — e~'^^. 

Next theorem, the Principle of the Restricted Invertibihty due to J. Bourgain 
and L. Tzafriri, is the first and probably the most used result from the extensive 
paper |BT 87j . By (ej) we denote the canonical basis of M". 

Theorem 7.2 (J. Bourgain and L. Tzafriri |BT~8 7J^ Let T : l"^ ^ q be a lin- 
ear operator with ||Tej|| > 1 for all i. Then there exists a subset a of {1, . . . ,n} of 
size \a\ > cn/||r|p and such that 

\\Tx\\ > c\\x\\ for all x G M*^. 

Denote by Rademacher random variables, i.e. sequence of independent 
symmetric ±1 valued random variables. 

Theorem 7.3 (J. Elton Hj) Let Xi, . . . , Xn be vectors in a real Banach space, sat- 
isfying 



Vi < 1 and e|| e 



i=l 



> 6n 



for some number 6 > 0. Then there exists a subset a C {1, . . . ,n} of cardinality 
|c| > ci{6)n such that 

ll^OiXi > C2{6) y2 



for all real numbers (cj). 



The best possible asymptotics is known: c\{5) x (5^ and C2{5) x 5 |MV 03j . 

Now we state our main result. By [gi) we denote a sequence of independent 
normalized Gaussian random variables. 



Theorem 7.4 Let xi,. . . ,Xn be vectors in a real Banach space, satisfying 

1/2 



i=l 



aril 



d mS^giXi 



i=l 



i=l 



> 6n 



(7.2) 



for all real numbers (cj) and for some number 6 > 0. Then there exist two numbers 
s > and c6 < t < 1 connected by the inequality st > c6 / log'^^'^ {2 / 6) and a subset 
a of {1, . . . ,n} of size \a\ > s^n such that 



aiXi >cty^J 



(7.3) 



for all real numbers (cj). 
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The first assumption in ()7.2|) is satisfied in particular if ||xj|| < 1 Vi. Also, since 
t < 1, we always have s > c5/log^^^{2/5). This instantly recovers Elton's Theorem. 

Next, Theorem 17.41 essentially extends the Bourgan-Tzafriri's principle of re- 
stricted invertibility to operators T : I2 ^ X acting into arbitrary Banach space X. 
The average size of T is measured by its i-norm defined as i{T)'^ = E||T(7|p, where 
g = (gi, . . . , gn). If X is a Hilbert space, then i(T) equals the Hilbert-Schmidt norm 
of T. 

Corollary 7.5 (General Principle of the Restricted Invertibility) LetT : I2 
X be a linear operator with 1{T) > y/n, where X is a Banach space. Let a = 
clog~^/^(2||T||). Then there exists a subset a 0/ {!,..., n} of size \a\ > ca^n/||T|p 
and such that 

\\Tx\\ > a|(T|"^/^||x||;f for all x G M'^. 

If X is a Hilbert space, the condition £(T) > -y/n is satisfied, for example, if 
^ 1 for all i. In that case |(T|~^/^||x||i<T in the conclusion can be improved to 
I ^11 — ll^ll/j Grothendieck factorization (we will do this below). This recovers 

Bourgain-Tzarfiri's Theorem up to the logarithmic factor a. 

Proof of Corollary 17.51 We apply Theorem 17.41 to the vectors Xi = yj^Tej, 
i = 1, . . . , n. Then for any oi, . . . a„ 



E 

i=i 



n 



i=l 



< Vn- ||(ai, . . . a„)||^n. 



Since by Kahane's inequality i{T) < CM\\Tg\\, the second assumption in 1)7. 2|) holds 
with 5 = c/||r||. Hence there exist numbers c/||r|| < t < 1 and s satisfying 

st > c6/\og^/^{2/6) 

and a subset a of {1, . . . ,n} of size \a\ > s^n so that we have (multiplying both 
sides by |c|~"^/^) 



— ^\T\\ ^ II o^iTei > ct\a\ ^^"^ |aj| for all real numbers (cj). 

Since t < 1, we have s > c6/ log^^'^{2/6) > a/||T|| and consequently |fT| > a^n/||T|p 
as required. As |ct| > s^n, we have s < y/\a\/n, hence 



a 1 
— < 



\T\\s 



5/s < c-hlog^/'^{2/6) =t/a. 
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Hence 

II ^^ajTcj > a|cr|~^''^^^ \ai\ for all real numbers (aj) 

as required. ■ 

To get the actual invertibility of T : ^2 ~^ ^ can use the Grothendieck 
factorization. Remarkably, this step works not only for X being a Hilbert space but 
for a much larger class of spaces, namely for those of type 2. 

Definition 7.6 A Banach space X has type 2 if there exists a constant M such 
that the inequality 



1^2 II 



holds for all finite sequences of vectors (xj) in X . The minimal possible constant M 
is called the type 2 constant of X and is denoted by T2{X). 

An important example of spaces that have type 2 are all Lp-spaces (2 < p < oo) 
and their subspaces. 

Lemma 7.7 (Grotliendieck Factorization, see e.g. | LT| 15.4) Let S : E ^ 

M"* be a linear mapping, where E is a Banach space of type 2. Then there exists a 
subset r] of {1, . . . ,n} of size \rj\ > m/2 and such that 

\\Pr,S\\E^l^ < CT2{X)m-^/^\\S\\E-.iY^ 

where Prj is the coordinate projection in M"^ onto W^. 

Applying this lemma to the inverse of T on its range, we obtain 

Corollary 7.8 (Restricted Invertibility under type 2) Let T : I2 ^ X be a 

linear operator with i{T) > yjn, where X is a Banach space of type 2. Let a = 
clog~^/^(2||r||). Then there exists a subset a of {1, ... ,n} of size \a\ > Q^n/||r|p 
and such that 

llTxIl > ar2(X)"^||2;|| forallx(^W. 
For X = l2 this recovers Bourgain-Tzafriri Theorem up to the logarithmic factor a. 

Proof. By Corollarv 17.51 the operator T is invertible on the subspace E = T(R°") 
of X, and its inverse S = T^^ : E M*^ has norm HSH^j-^/j < a~^|(7|^/^. By the 
Grothendieck factorization, we find a subset r] d a oi size \r]\ > ^\a\ and such that 

WP.SWe^iv < Ca-'T2iX). 
This means that ||Tx||x > ca T2(X)^"^ ||x||;i for all x G r/. ■ 
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Finally, the general Principle of the Restricted Invertibility rewritten in geomteric 
terms gives a result related to Dvoretzky Theorem. 



Corollary 7.9 Let K be a symmetric convex body in M." 
Mxlog"^/^(2/Mx). Then there exists a subset a o/{l,, 
and such that 

cM{KnM.'') C vl^i-Bf. 



containing i?2 • ^et M = 
■ ■ ,n} of size \a\ > cM'^n 

(7.4) 



Here B7 denotes the unit ball of 



Proof. We apply the general Principle of Restricted Invertibility in the space X = 
(M", II • lli^). We have e{id : ^ X) = (see [Tjj (12.7)) and \\id : q 

X\\ < 1 because K contains i?2 • Hence for the operator T = c{MK)~^id : l'!^ ^ X 
we have i{T) > ^/n and ||T|| < C/Mk- The application of Corollary 17.51 completes 
the proof. ■ 



A link to Dvoretzky Theorem is provided by a result of Kashin |K 77j (see also 
[S]) that the cross-polytope ^JkB^ has a Euclidean section of proportional dimension. 
Precisely, there exists a subspace E in M.^ of dimension at least k/2 and such that 

(s| n ^) c {^/kBl nE)c c{bI n E). 

Actually, a random subspace E taken uniformly in the Grassmanian satisfies this 
with probability at least 1 — e~'^^. 

Taking such random section of both sides of (|7!I|) we get M{Kr\E) C C{B2nE), 
which recovers the second inclusion in Dvoretzky Theorem up to a logarithmic factor. 
The novelty of H7.4|) is that the section is coordinate. This might be important for 
future applications. 

Remark. Corollarv 17.91 mav fail for any set of size |a"| x M^n, even though it 
must hold for some larger set. Indeed, for K = a^/nB'^ with some large parameter 
o we have M log~^^^ a. Any set a for which Corollary 17.91 holds satisfies 

^/\a\B'[ D ca"-^ {log^^/^ a){a^/EB'^ nR") = c(log~^/2 a)^Bf , so \a\ > {log~^/^ a)n. 
This is much larger than M^n x a~^(log^^ a)n. In particular, Corollarv 17.91 fails for 
any set of size |ct| ~ M'^n. 

Proof of Theorem 17. 4L By a slight perturbation we may assume that the vectors 
Xi are linearly independent, and by applying appropriate linear transformation we 
may further assume that X = (M", || • H^-) where i^' is a symmetric convex body in M" 
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and that Xi = Ci, the canonical vector basis in M". We then rewrite the assumptions 
as -^B^ C K, E\\g\\K > 6n. Then for the polar body A = K° = {x : {x,y) < 
1 Vy G K} we have 

n 

A<Z^B^, ^ := Esup Vc/ix(i) > 5n. (7.5) 

Although Theorem 16.51 can be used to estimate E, we will need to have some 
control on the upper limit in the integral (|6.9() . This can be done as follows. Not- 
ing that E{A) = n^^^'^E, we bound the expectation in (|7.5|) by Dudley's entropy 
inequality ()6.1U|1 : 

E < [ y/logN{A,tD) dt (7.6) 

JcE/n 

where D = Bal^Lg) = ^/nB2■ The upper limit in the integral is 1 because ^ C L), 
so the integrand vanishes for t > 1. By Theorem 14. H 

N{A,tD) = N{t-^A,D) < i:{Ct-^Af. 

Since Ct~i^ Q Ct^^ ■ BaU(Lf ), Lemma lO gives 

where v{t) = v{Ct'~^A). Hence 

Cn 



log N{A,tD) < Cv{t)\og 



Using this in Dudley's entropy inequality ()7.6|) . we obtain 



E<C^f v{t)\og(^).dt 

JcE/n V \tv{t)J 



Let s(t)^ = v{t)/n. Since s{t) < 1 and E > 5n, we have 



M<£.(Jiog(^ 



dt. 



Comparing the integrand to that of 



\og{l/c5) = -dt 

IcS t 



1 1 
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we conclude that there exists a number c6 < t < 1 such that 



ds{t)J ~ t\og{l/c8) 
Multiplying both sides by t, we obtain 



, ^ c5 /I /los(l/c6)\ c5 



log(lM)/ V ^ V cS ) - log3/2(2/<^)- 

It remains to interpret v{t). By the symmetry of A, is the maximal rank 
of a coordinate projection P in M" such that P{Ct~^A) D P{^B^). Let be the 
range of P; then \a\ = v{t) = s{t)'^n. By duality, the inclusion above is equivalent 
to C~HKr\W C 2Sf. Equivalents, ||x||i^ > C-H\\x\\v^ for all x & E. This is 
precisely the conclusion (|7.3|) . The proof is complete. ■ 

Remark. Although the first assumption in ()7.2() is rather nonrestrictive, it can 
further be weakened. Tracing where it was used in the proof (in Lemma 14. 7p we 
see that only "average" volumetric properties of K matter. We leave details to the 
interested reader. 
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